Superlinear Parallelization of k-Nearest Neighbor Retrieval
نویسندگان
چکیده
With m processors available, the k-nearest neighbor classifier can be straightforwardly parallelized with a linear speed increase of factor m. In this paper we introduce two methods that in principle are able to achieve this aim. The first method splits the test set in m parts, while the other distributes the training set over m sub-classifiers, and merges their m nearest neighbor sets with each classification. For our experiments we use TIMBL, an implementation of the k-NN classifier that uses a decision-tree structure for retrieving nearest neigbors, and that employs feature weighting. While the first method consistently scales linearly, with the second method we observe cases of both superlinear and sublinear scaling. Analysis shows that superlinear scaling can occur with datasets of which the feature weights exhibit a low variance; retrieval of nearest neighbors from the tree structure becomes exponentially slower with more data. Hence, the retrieval of classifications from m subclassifier decision structures based on 1/mth parts of the training set can be substantially more than m times faster.
منابع مشابه
Superlinear parallelisation of the k-nearest neighbor classifier
With m processors available, the k-nearest neighbor classifier can be straightforwardly parallelized with a linear speed increase of factor m. In this paper we introduce two methods that in principle can achieve this aim. The first method splits the test set in m parts, while the other distributes the training set overm sub-classifiers, and merges their m nearest neighbor sets with each classif...
متن کاملFUZZY K-NEAREST NEIGHBOR METHOD TO CLASSIFY DATA IN A CLOSED AREA
Clustering of objects is an important area of research and application in variety of fields. In this paper we present a good technique for data clustering and application of this Technique for data clustering in a closed area. We compare this method with K-nearest neighbor and K-means.
متن کاملA Parallel Algorithms on Nearest Neighbor Search
The (k-)nearest neighbor searching has very high computational costs. The algorithms presented for nearest neighbor search in high dimensional spaces have have suffered from curse of dimensionality, which affects either runtime or storage requirements of the algorithms terribly. Parallelization of nearest neighbor search is a suitable solution for decreasing the workload caused by nearest neigh...
متن کاملAsymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data
Kernel density estimators are the basic tools for density estimation in non-parametric statistics. The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in which the bandwidth is varied depending on the location of the sample points. In this paper, we initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کامل